Synset Assignment for Bi-lingual Dictionary with Limited Resource
نویسندگان
چکیده
This paper explores an automatic WordNet synset assignment to the bi-lingual dictionaries of languages having limited lexicon information. Generally, a term in a bilingual dictionary is provided with very limited information such as part-of-speech, a set of synonyms, and a set of English equivalents. This type of dictionary is comparatively reliable and can be found in an electronic form from various publishers. In this paper, we propose an algorithm for applying a set of criteria to assign a synset with an appropriate degree of confidence to the existing bi-lingual dictionary. We show the efficiency in nominating the synset candidate by using the most common lexical information. The algorithm is evaluated against the implementation of ThaiEnglish, Indonesian-English, and Mongolian-English bi-lingual dictionaries. The experiment also shows the effectiveness of using the same type of dictionary from different sources.
منابع مشابه
Thai WordNet Construction
This paper describes semi-automatic construction of Thai WordNet and the applied method for Asian wordNet. Based on the Princeton WordNet, we develop a method in generating a WordNet by using an existing bi-lingual dictionary. We align the PWN synset to a bilingual dictionary through the English equivalent and its part-of-speech (POS), automatically. Manual translation is also employed after th...
متن کاملProblems and Procedures to Make Wordnet Data (Retro)Fit for a Multilingual Dictionary
The data compiled through many Wordnet projects can be a rich source of seed information for a multilingual dictionary. However, the original Princeton WordNet was not intended as a dictionary per se, and spawning other languages from it introduces inherent ambiguity that confounds precise inter-lingual linking. This paper discusses a new presentation of existing Wordnet data that displays join...
متن کاملEnlarging the Croatian WordNet with WN-Toolkit and Cro-Deriv
Wordnet is a standard semantic resource for several Natural Language Processing tasks and it is available for an increasing number of languages. The Croatian Wordnet (CroWN) was a relatively small resource with 10.026 synsets and 31.367 synset-variant pairs covering only 45.91% of the so-called Core WordNet. Comparing these figures with the size of the Princeton WordNet for English version 3.0,...
متن کاملAutomatic Discovery of Fuzzy Synsets from Dictionary Definitions
In order to deal with ambiguity in natural language, it is common to organise words, according to their senses, in synsets, which are groups of synonymous words that can be seen as concepts. The manual creation of a broad-coverage synset base is a timeconsuming task, so we take advantage of dictionary definitions for extracting synonymy pairs and clustering for identifying synsets. Since word s...
متن کاملA Cross-Lingual Dictionary for English Wikipedia Concepts
We present a resource for automatically associating strings of text with English Wikipedia concepts. Our machinery is bi-directional, in the sense that it uses the same fundamental probabilistic methods to map strings to empirical distributions over Wikipedia articles as it does to map article URLs to distributions over short, language-independent strings of natural language text. For maximal i...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008